GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

نویسندگان

  • Florian Wagner
  • Xia Li
چکیده

METHOD Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. RESULTS I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GO-PCA: An Unsupervised Method to Explore Biological Heterogeneity Based on Gene Expression and Prior Knowledge

Genome-wide expression profiling is a cost-efficient and widely used method to characterize heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such datasets typically relies on generic unsupervised methods, e.g. principal component analysis or hierarchical clustering. However, generic methods fail to exploit the significant amount of...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

A Simple and Efficient Supervised Method for Spatially Weighted PCA in Face Image Analysis

Principal Component Analysis (PCA) is an example of a successful unsupervised statistical dimensionality reduction method, especially in small sample size problems. Despite the well-known attractive properties of PCA, the traditional approach does not incorporate prior information extracted from a specific domain knowledge. The development of techniques that bring together dimensionality reduct...

متن کامل

Alert correlation and prediction using data mining and HMM

Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...

متن کامل

Effects of blastocyst artificial collapse prior to vitrification on hatching and ‎survival rates and the expression of klf4 gene in mouse embryos

Although the rate of blastocysts implantation of embryos is higher than previous stages but their survival rate is lower than them, which could be attributed to the completely filled blastocoel cavity with liquid and increased possibility of the formation of ice crystals. This liquid could prevent the penetration of cryoprotecting materials into the embryos. In this study, we reduced the volume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015